some question about training a thinking model from base backbone with lora #8153

652994331 · 2025-05-25T11:30:14Z

652994331
May 25, 2025

I do have a question guys, I was trying to train a sft qwen2.5-coder model from qwen2.5-instruction-7b. my task is NL2SQL(from human prompt to sql code) I used 10k training data and the sft result went well . got like 5% increasing （sft-qwen2.5-coder-7b vs base - qwen2.5-coder-7b）. however, I also tried to training a thinking-qwen2.5-coder-7b ， I first use o1 to generate some cot data like :

as you can see in this figure , I use <|image_pad|> and <|video_pad|> to replace<think> </think>. and when inference , I put <|image_pad|> right after the prompt and hope the model can generate the thinking process and the answers.

I was trying with 2500 this kind of "thinking data" (far less than 10k normal sft data ) to train , I trained 50000steps and the model accuracy is even lower than base - qwen2.5-coder-7b. I did have thinking process, but the answer is just not as good as sft- qwen2.5-coder-7b . I really wanna ask for the reason or my mistakes , thanks guys . really appreciate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

some question about training a thinking model from base backbone with lora #8153

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

some question about training a thinking model from base backbone with lora #8153

Uh oh!

Uh oh!

652994331 May 25, 2025

Replies: 0 comments

652994331
May 25, 2025